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Abstract 

Plastids are the semiautonomous organelles that possess their own genome inherited from the cyanobacterial ancestor. The 
primary function of plastids is photosynthesis so the structure and evolution of plastid genomes are extensively studied in 
photosynthetic plants. In contrast, little is known about the plastomes of nonphotosynthetic species. In higher plants, plastid 
genome sequences are available for only three strictly nonphotosynthetic species, the liverwort Aneura mirabilis and two 
flowering plants, Epifagus virginiana and Rhizanthella gardneri. We report here the complete sequence of a plastid genome 
of nonphotosynthetic mycoheterotrophic orchid Neottia nidus-avis, determined using 454 pyrosequencing technology. It 
was found to be reduced in both genome size and gene content; this reduction is however not as drastic as in the other 
nonphotosynthetic orchid, R. gardneri. Neottia plastome lacks all genes encoding photosynthetic proteins, RNA polymerase 
subunits but retains most genes of translational apparatus. Those genes that are retained have an increased rate of both 
synonymous and nonsynonymous substitutions but do not exhibit relaxation of purifying selection either in Neottia or in 
Rhizanthella. 
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Introduction 

The information on chloroplast genome sequences is essen- 
tial in different fields of plant biology: plant physiology, 
population genetics, phylogenetics, and evolution. As a con- 
sequence, the number of published chloroplast genome se- 
quences greatly increased in past three years allowing an 
assessment of general trends of their evolution (reviewed 
in Wicke et al. 2011). In contrast, little is known about 
the structure of the plastid genome in nonphotosynthetic 
species. In higher plants, complete plastid genome sequen- 
ces are available now for only three completely nonphoto- 
synthetic species: two angiosperms — the parasitic plant 
Epifagus virginiana from Orobanchaceae (Wolfe et al. 
1992) and mycoheterotrophic orchid, Rhizanthella gardneri 



(Delannoy et al. 201 1), and a mycoheterotrophic liverwort 
Aneura mirabilis (Wickett, Zhang, et al. 2008). While there is 
about 3,000 nonphotosynthetic plants representing more 
than ten families, this is obviously insufficient to infer 
general patterns of the evolution of plastome in the absence 
of photosynthetic activity. We have increased this set by 
sequencing the entire plastome of a mycoheterotrophic 
orchid, Neottia nidus-avis. Neottia belongs to the same family 
as Rhizanthella whose plastid genome was recently character- 
ized (Delannoy et al. 2011) but the shift to heterotrophy 
occurred independently in these two lineages (Molvray 
et al. 2000). Thus, this information is useful for revealing 
both general features of nonphotosynthetic plant plastomes 
and for comparing the plastid genome structure under the 
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Fig. 1 . — Circular map of the plastid genome of Neottia nidus-avis. Genes shown inside the circle are transcribed clockwise, those outside the circle 
are transcribed counterclockwise. Asterisks indicate intron-containing genes, dark gray bars inside the inner circle indicate guanine-cytosine content. 



parallel loss of photosynthetic activity in a specific plant 
group, the family Orchidaceae. 

The sequence of Neottia plastome was assembled from 
the partial genomic DNA sequence produced using high- 
throughput pyrosequencing technology (454 sequencing) 
complemented with Sanger sequencing. As expected, the 
plastid genome of Neottia is highly reduced in length (92 
Kb compared with 146-149 in photosynthetic orchids) 



and in gene content (fig. 1). All genes encoding photosys- 
tem I and II components are lost or pseudogenized. The 
same is true for the genes of the cytochrome b 6 f compo- 
nents and photosystem I assembly proteins; ccsA (involved 
in c-type cytochrome synthesis) and cemA (chloroplast en- 
velope membrane protein) genes are also lost. All ndh genes 
are completely lost or turned to pseudogenes. The latter 
does not however seem to be related to heterotrophic 
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way of life since it is characteristic for several photosynthetic 
angiosperms, including orchids (Chang et al. 2006; Wu et al. 
2010; Blazieretal. 201 1)andforgymnosperms(Braukmann 
et al. 2009). Also, Neottia plastome apparently lacks func- 
tional matK gene. It contains a region with high similarity to 
matK but its comparison with orthologous sequences from 
photosynthetic orchids reveals strong divergence of its 
5'end (including the substitution in the start codon); this 
suggests that in Neottia, matK is a pseudogene. Despite this, 
the genes that are supposed to require matK protein activity 
for the splicing of their mRNAs (rp/2, rpsU, dpP, trnA-UGC, 
trnG-UCQ have retained their introns. This is in contrast 
with the genus Cuscuta where matK loss observed in the 
species of subgenus Grammica is correlated with the loss 
of group Ha introns (McNeal et al. 2009). There are two pos- 
sible explanations. First, we can imagine that mat/C-like re- 
gion retained in Neottia is still functional — this is possible 
under the condition that alternative start codon is adopted 
or that multiple RNA editing events occur in the 5'end of this 
region. Second, the involvement of matK in the splicing of 
group Ha introns could be not as essential in orchids as it is 
Cuscuta (and, apparently, in most plants). This hypothesis is 
supported by the fact that Rhizanthella, another nonphoto- 
synthetic orchid, totally lacks any matK-Wke regions but also 
retains group lla introns and that matK is pseudogenized in 
Corallorhiza trifida (Freudenstein and Senyo 2008), a myco- 
heterotrophic species that partially retains photosynthetic 
activity. 

The only class of genes that is unaffected by the reduction 
is the ribosomal RNA genes. All four rRNA genes character- 
istic for typical plant plastomes are present in Neottia and 
share very high similarity (96-99%) with their orthologs 
from other orchids. Genes encoding another essential com- 
ponent of the ribosome, the ribosomal proteins, are also 
mostly retained: Neottia plastome encodes a complete set 
of large subunit proteins genes (including rpl23 and rpl22 
that are turned to pseudogenes in many photosynthetic spe- 
cies) and most small subunit proteins genes. The function- 
ality of two small subunit proteins genes — rps16 and 
rps18 — is questionable. Though they are highly similar to 
the orthologous genes from photosynthetic orchids in silico 
translation reveals internal stop codons in both sequences. 
These stop codons are however in frame and thus are po- 
tential targets of RNA editing so additional experiments are 
required to confirm the nonfunctionality of rps16and rps18. 
In Anthoceros formosae, conversion of nonsense codons in- 
to sense is found in 52 genes, including rps18 (Kugita et al. 
2003). RNA editing system was shown to be active in Rhi- 
zanthella so it is presumably active in Neottia plastids too. 
Another possible target of RNA editing is the rpl2 gene that 
has atypical start codon ACG. As for transfer RNA, we were 
able to find the sequences with high similarity for all tRNAs 
characteristic for plant plastomes with the exception of trnl- 
GAU. Most of them seem to be functional genes because 



they share high similarity with their orthologs from photo- 
synthetic orchids (overall similarity of tRNA sequences be- 
tween Neottia and Phalaenopsis is 0.95) and have 
conserved secondary structure typical for tRNAs. trnV- 
UACand trnP-UGG are putative pseudogenes since they dif- 
fer from their orthologs by multiple substitutions and indels. 
infA gene, which encodes translation initiation factor 1 is 
also present. In contrast to translation apparatus that seems 
to be almost unaffected, genes of the transcription machi- 
nery — those encoding plastid RNA polymerase subunit- 
s — are lost {rpoA, rpoC1) or pseudogenized (rpoB, 
rpoC2). Two genes involved in plastid metabolism — accD 
and dpP— are retained, as well as two large ORF encoding 
proteins of the unknown function — ycfl and ycf2. The re- 
tention of ycfl is the most enigmatic since it is presumably 
pseudogenized in photosynthetic orchids Oncidium and 
Phalaenopsis. The regions with strong similarity to ycfl 
are present in their plastomes but their in silico translation 
reveals multiple stop codons due to nontriplet indels. Pseu- 
dogenization of ycfl, as well as another large plastid ORF, 
ycf2 occurs in grasses where a region similar to ycfl is pres- 
ent but highly reduced due to multiple deletions (Hiratsuka 
et al. 1989). The structure of ycfl in Phalaenopsis and On- 
cidium may represent a first stage of gene degradation. An- 
other explanation of the frameshifts might be related to 
sequencing errors because ycfl is rich in homopolymer re- 
gions, which are prone to errors. Broader survey of ycfl in- 
tegrity in orchids is required to see if this gene is indeed 
nonfunctional in either lineage of orchids. 

Despite about one-third reduction in length, the overall 
structure of Neottia plastid genome is conserved and collin- 
ear with that of photosynthetic orchids. The only alteration 
of the gene order is the position of inverted repeat-large 
single copy border that resides in the rps3-rpl16/rps3-trnK 
spacer in Neottia. The expansions of the IR are well docu- 
mented in different lineages of angiosperms; in particular 
in orchids IR was found to include complete sequence of 
rps 1 9, trnH, a nd a pa rt of rpl22 (Wa ng et a 1 . 2 008) . I n Neottia, 
further expansion of the IR is observed — it includes rpl22 and 
rps3. Such structure of J| R _ LSC is by now unique however it is 
possible that increased taxon sampling will reveal similar 
structures in other orchids, either photosynthetic or not. 

To assess the selection constraint on plastid genes in 
Neottia, we performed the analysis of synonymous and 
nonsynonymous substitutions in protein-coding genes. Rhi- 
zanthella, another nonphotosynthetic orchid was also in- 
cluded in the analysis. The examination of 6N/6S ratio for 
individual genes that are shared between Neottia, Rhizan- 
thella, and photosynthetic orchids demonstrates that most 
of them are evolving under purifying selection in both non- 
photosynthetic species (fig. 2). The only Neottia gene where 
6N/6S is greater than 1 is rpl23. The sequence of this gene is 
highly conserved between Phalaenopsis and Oncidium, 
differing by single substitution (nonsynonymous). rpl23 is 
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Fig. 2. — Pairwise dN/dS (the ratio of the number of nonsynonymous substitutions per nonsynonymous site to the number of synonymous 
substitutions per synonymous site) for all genes shared between photosynthetic orchids and Neottia and/or Rhizanthella. N/A indicates the cases where 
it was not possible to estimate the ratio due to the absence of synonymous substitutions or the absence of gene in the plastid genome. 



a pseudogene in several photosynthetic plants, for example, 
Caryophylalles (Logacheva et al. 2008). High d/V/d5 in this 
gene may indicate on the relaxation of selection constraint 
as initial stage of pseudogenization. However, in rps14 and 
rpl33, dN/dS is found to be increased in photosynthetic spe- 
cies. Such increase may be caused by the unreliability of dN/ 
dS estimates for short genes (Haddrill et al. 2007) and thus 
not be biologically relevant. To get a generalized view on 
substitution rate in Neottia and Rhizanthella, we calculated 
substitution values for all genes combined in one sequence. 
This demonstrated that dN/dS ratios are very similar be- 
tween two photosynthetic species and between photosyn- 
thetic and nonphotosynthetic ones. In contrast, the number 
of both synonymous and nonsynonymous substitutions dif- 
fers greatly being about 2.5 times much in Neottia and 5 
times in Rhizanthella (fig. 3). The apparent effect of higher 
substitution rate is often observed in annual plants when 
compared with perennials (e.g., Yue et al. 2010); however, 
both Neottia and Rhizanthella are perennials as well as Pha- 
laenopsis and Oncidium. This suggests that this effect is re- 
lated to their heterotrophic way of life and that plastid genes 
in nonphotosynthetic orchids have higher mutation rate but 
retain their functionality. 




dN/dS dN dS 



Fig. 3. — dN/dS ratio and dN and dS values for all genes shared 
between Neottia, Rhizanthella, Oncidium, and Phalaenopsis, combined 
in one sequence. 



The broader comparison that includes all nonphotosyn- 
thetic higher plant plastomes — two orchids, parasitic dicot 
Epifagus and liverwort Aneura — reveals both parallelisms 
and dissimilarities in their structure. The plastome sizes differ 
almost twice — from about 108 Kb in Aneura to 59,190 bp in 
Rhizanthella. The degree of gene loss and pseudogenization 
also differs — in Aneura, the plastome is much less affected 
than in angiosperms. It retains all transfer RNA, RNA poly- 
merase, and ribosomal protein genes and also many photo- 
synthesis-related genes (Wickett, Zhang, et al. 2008) being 
similar in this respect to the species of Cuscuta that retain 
photosynthetic activity (Funk et al. 2007; McNeal et al. 

2007) . In strictly nonphotosynthetic angiosperms not a sin- 
gle photosynthesis-related gene is retained as intact reading 
frame, even in Neottia that is characterized by the least re- 
duction in both size and gene content and is only 1 6 Kb less 
than in Aneura. In general, the structure and gene content 
of plastome are more similar within nonphotosynthetic an- 
giosperms than in any of them compared with Aneura. It is 
well known that the activity of the plastid is not provided by 
plastid genome only but also is highly dependent on and 
coordinated with that of nuclear genome (reviewed in Taylor 
1989 and Woodson and Chory 2008). The results from an 
ongoing genomic project of a liverwort, Marchantia poly- 
morpha, indicate that the organization of the nuclear ge- 
nomes in angiosperms and in liverworts differs greatly; in 
particular, the diversity of gene families is lower in March- 
antia (Yamato et al. 201 1). This and other differences in nu- 
clear genome organization can account for the difference in 
the structure of plastid genomes in nonphotosynthetic an- 
giosperms and nonphotosynthetic liverworts. Other possible 
explanation is the time past from the moment of switch to 
heterotrophy. Though there are no direct (paleontological) 
observations on this subject, we suggest that heterotrophy 
is the most recent in A. mirabilis — since another species of 
the genus Aneura are photosynthetic (Wickett, Fan, et al. 

2008) and the most ancient in Epifagus which shares the 
holoparasitic way of life with related genera Conopholis 
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Fig. 4. — Gene and pseudogene content in the plastomes of nonphotosynthetic angiosperms, numbers indicate different groups of genes/ 
pseudogenes according to their presence in the plastomes: (1) genes that are present in all plastid genomes of nonphotosynthetic angiosperms, (2) 
pseudogenes that are present in all plastid genomes of nonphotosynthetic angiosperms, (3) genes that are unique for Neottia, (4) pseudogenes that are 
unique to Neottia, (5) genes present only in Neottia and Rhizanthella, (6) pseudogenes present only in Neottia and Rhizanthella (no), (7) genes unique to 
Rhizanthella (no), (8) pseudogenes unique to Rhizanthella, (9) genes present only in Rhizanthella and Epifagus (no), (10) pseudogenes present only in 
Rhizanthella and Epifagus, (1 1 ) pseudogenes unique to Epifagus, (1 2) genes unique to Epifagus, (1 3) pseudogenes present only in Epifagus and Neottia, 
and (14) genes present only in Epifagus and Neottia. 



and Orobanche (dePamphilis et al. 1997). Neottia and Rhi- 
zanthella represent genera that include only nonphotosyn- 
thetic species but their most closely related genera are 
completely or partially photosynthetic. Moreover, in some 
systematic treatments, Neottia is merged with photosyn- 
thetic genus Listera (Chase et al. 2003) that suggests recent 
switch to heterotrophy, similar to Aneura. In terms of gene 
content, Neottia is more similar to Epifagus than to Rhizan- 
thella (fig. 4) or Aneura. Phylogenetically Neottia is close to 



Rhizanthella thus the dissimilarity of their plastid genomes is 
unlikely to be related to major dissimilarities in the organiza- 
tion of their nuclear genomes but rather to lineage-specific 
variations in rate and pattern of plastid genome evolution. 
This phenomenon is well known with regard to single nucle- 
otide substitutions (reviewed in Muse 2000), and it is likely to 
be applicable to length mutations as well. 

Among about 1 00 of genes encoded in typical plastid ge- 
nome, only 29 are shared in all the plastid genomes of 
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nonphotosynthetic plants. This core set of genes constitutes 
of ribosomal and transfer RNA and ribosomal protein genes, 
translation initiation factor (infA), protease subunit (dpP), 
acetyl-CoA carboxylase subunit (accD) genes, and two large 
ORF of unknown function — ycfl and ycf2 (fig. 4). This is in 
contrast with the situation observed in experimental systems 
of transition to heterotrophy. Calli cultured for a long time 
on sucrose-rich media also undergo multiple deletions in 
plastid DNA and extensive gene losses. However, their size 
and location drastically differ in different calli samples and 
do not seem to converge to a gene set shared by all samples. 
Moreover, the loss of circular structure typical for plastid 
DNAs, is reported (Harada et al. 1992; Abe et al. 2002). 
The stability of plastid genomes of nonphotosynthetic 
plants — the conservation of typical quadripartite structure, 
the existence of shared gene set, and the evidence of puri- 
fying selection acting on these genes — suggests that the 
evolution of plastid genomes in nonphotosynthetic plants 
is dictated by the constraints that are obviously different 
from those acting on photosynthetic plant plastomes but 
are not less strong. 

Orchidaceae is a perfect model system for deeper study of 
the evolution of nonphotosynthetic plants' plastomes be- 
cause it contains many cases of independent transition to 
nonphotosynthetic lifestyle (Molvray et al. 2000), from 
rather ancient occurring at the level of tribe to extremely 
recent occurring in certain individuals within the population 
(Tranchida-Lombardo et al. 2010). Moreover, many photo- 
synthetic orchids are able to use fungi as secondary carbon 
source being mixotrophic rather than fully autotrophic 
(Bidartondo et al. 2004; Abadie et al. 2006). Mixotrophy 
is thought to be the preadaptation that mediates the tran- 
sition to completely heterotrophic lifestyle (Selosse and 
Roy 2009). It is interesting to infer if any of the changes 
in the plastome characteristic for strictly nonphotosyn- 
thetic plants (e.g., gene loss, increase of substitution rate) 
are observed in mixotrophic species. We expect that fur- 
ther sampling of the plastid genome sequences from 
the species of Orchidaceae representing different stages 
and different times of transition to mycoheterotrophy will 
provide valuable information about the evolution of plas- 
tomes in nonphotosynthetic plants. 

Materials and Methods 

Total DNA was extracted from the above-ground part of 
three Neottia plants using NucleoSpin Plant II kit (Macher- 
ey-Nagel, Germany). DNA (10 |ig) was used for sequencing 
with Roche Genome Sequencer FLX system using the Tita- 
nium kit (454 Life Sciences). Sequencing was performed at 
the University of Illinois at Urbana-Champaign, W.M. Keck 
Center for Comparative and Functional Genomics. The sam- 
ple was run on a half of a picotiter plate. The sequencing 
resulted in 590640 reads with an average read length of 



528 bp. The output from the sequencing system in Standard 
Flowgram Format was converted to FASTA. Then a BLAST 
database was made from the resulting FASTA file and a plas- 
tome of Oncidium, the photosynthetic orchid was queried 
against it. All reads with e-value lower then 1 0~ 1 0 were used 
for de novo assembly using MIRA assembler ver. 3.2.0 
(Chevreux et al. 1999). The assembly resulted in 23 contigs 
with length more than 1,000 nucleotides. Two largest con- 
tigs (1 1,141 and 49,963 nucleotides) were used as a basis 
for the generation of draft version of the Neottia plastome. 
They were aligned with Oncidium plastome using the whole 
VISTA genome alignment tool (Frazer et al. 2004). Then 
Sanger sequencing was used to fill the gaps. To check 
the accuracy of the assembly and to correct possible 454 
sequencing errors associated with homopolymer runs, sev- 
eral regions were sequenced by Sanger sequencing using 
the primers designed on the base of the assembly (supple- 
mentary table S1, Supplementary Material online). Average 
coverage for regions derived from 454 assembly is assessed 
as 29. 8x; for regions sequenced by Sanger sequencing it is 
2x. Polymerase chain reaction (PCR) amplification was per- 
formed on Biometra T300 thermal cycler using Encyclo PCR 
kit (Evrogen, Russia). PCR conditions were as follows: initial 
denaturation 3 min at 94 °C, then 35 cycles of 1 5 s at 94 °C, 
25 s at 59 °C, and 1-5 min (depending on the expected 
length of the product) at 72 °C. Sanger sequencing was per- 
formed in the interinstitutional sequencing center at Engel- 
hardt Institute of Molecular Biology (Moscow, Russia) using 
ABI PRISM BigDye Terminator kit v. 3.1 with following anal- 
ysis on ABI PRISM 3730 genetic analyzer (Applied Biosys- 
tems). Initial annotation was produced using DOGMA 
(Wyman et al. 2004). Then manual correction and adjust- 
ment, that included alignment of every Oncidium and Pha- 
laenopsis gene with Neottia plastome sequence was 
performed. The regions with similarity to known protein- 
coding genes but lacking intact ORF were classified as pseu- 
dogenes. To detect tRNAs pseudogenes, each tRNA-like 
sequence was analyzed using tRNAscan-SE (Lowe and Eddy 
1997) and by comparison with its putative orthologs from 
Oncidium and Phalaenopsis. Those sequences that lack typ- 
ical tRNA folding and/or differ from their orthologs by mul- 
tiple indels or substitutions were considered to be 
pseudogenes. The map of Neottia plastome was visualized 
using OGDRAW online tool (Lohse et al. 2007) with further 
manual correction. Assembled and corrected sequence of 
Neottia plastome was deposited in the GenBank under 
accession number JF325876. 

For 6N/6S calculation, all protein-coding plastid genes 
that are shared between Neottia, Rhizanthella, Oncidium, 
and Phalaenopsis were included in the analysis. Alignment 
was performed using ClustalW (Thompson et al. 1 994); dA// 
d5 ratios were calculated using codeml program from the 
package PAML 4.3 (Yang 2007). In the alignments of several 
genes, nontriplet indels or in-frame stop codons were found 
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near 3' -end. In these cases, only the part of sequence 
before presumable indel or stop codon was used for d/V/ 
d5 calculation. 

Supplementary Material 

Supplementary table S1 is available at Genome Biology and 
Evolution online (http://www.gbe.oxfordjournals.org/). 
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